Diffusion-based deep learning method for augmenting ultrastructural imaging and volume electron microscopy

Electron microscopy (EM) revolutionized the way to visualize cellular ultrastructure. Volume EM (vEM) has further broadened its three-dimensional nanoscale imaging capacity. However, intrinsic trade-offs between imaging speed and quality of EM restrict the attainable imaging area and volume. Isotropic imaging with vEM for large biological volumes remains unachievable. Here, we developed EMDiffuse, a suite of algorithms designed to enhance EM and vEM capabilities, leveraging the cutting-edge image generation diffusion model. EMDiffuse generates realistic predictions with high resolution ultrastructural details and exhibits robust transferability by taking only one pair of images of 3 megapixels to fine-tune in denoising and super-resolution tasks. EMDiffuse also demonstrated proficiency in the isotropic vEM reconstruction task, generating isotropic volume even in the absence of isotropic training data. We demonstrated the robustness of EMDiffuse by generating isotropic volumes from seven public datasets obtained from different vEM techniques and instruments. The generated isotropic volume enables accurate three-dimensional nanoscale ultrastructure analysis. EMDiffuse also features self-assessment functionalities on predictions’ reliability. We envision EMDiffuse to pave the way for investigations of the intricate subcellular nanoscale ultrastructure within large volumes of biological systems.


Supplementary Fig. 2 Network architecture of the UDiM model.
A traditional U-shape model comprising an encoder and a decoder is employed in every step of the diffusion chain.Attention is added to the bottleneck of the model to improve model performance.Supplementary Fig. 6 Uncertainty map for self-assessment of prediction reliability.
EMDiffuse denoising predictions and uncertainty maps of two example regions with different noise levels.Each row consists of the noise levels, input image, EMDiffuse prediction, uncertainty map and 1% ignored uncertainty pixels consisting of outliers and pixels in the structure boundary.
A threshold of 0.12 for the total uncertainty is set for assessing the reliability of the final prediction, as shown in the last column.The caution signifies the potential inaccuracies in the predicted structure, while yes denotes that the prediction is reliable.Scale bar, 0.1 μm.Supplementary Fig. 11 vEMDiffuse-i generates XY Views with isotropic resolution and captures the continuity of organelles along the z-axis in the Openorganelle liver dataset.
A series of input and predicted XY views (from 8 nm x 8 nm x 48 nm resolution anisotropic volume) along the z-axis of a liver ultrastructure dataset (jrc_mus-liver).The first column presents the layers of the input (depicted in green) and the vEMDiffuse-i predictions with the uncertainty values (depicted in yellow).The second and third columns showcase layers of the isotropic volume (GT) and vEMDiffuse-i predicted isotropic volume of an enlarged region of the first column.vEMDiffuse-i captures the gradual changes in the ultrastructure of the endoplasmic reticulum (ER), the same as shown in the ground truth slices.U, uncertainty value.Scale bar, first column, 1 μm; second and third columns, 0.3 μm.Cell culture.HeLa cells were maintained in complete DMEM media containing 10% fetal bovine serum (FBS) and 1% penicillin and streptomycin, at 37°C under 5% CO2.To prepare the cell samples for electron microscopy imaging, cells (5 x 10 4 /well) were seeded onto the coverslips in the 24-well-plates and allowed adherence overnight.Next, cells were fixed with 2.5% glutaraldehyde in 0.1M sodium cacodylate buffer (pH 7.4) for 2 hrs and processed for sample preparation.
Sample preparation.Tissues and cells were processed using a modified protocol based on the serial blockface scanning electron microscopy protocol by the National Center for Microscopy and Imaging Research 1 .Briefly, following fixation, tissues and cells were rinsed 5 times for 3 minutes each in 0.1 M sodium cacodylate before 1-hour incubation in 0.1 M sodium cacodylate buffer containing 2% osmium tetroxide and 1.5% potassium ferricyanide at 4°C.Samples were then rinsed and incubated with 1% thiocarbohydrazide for 20 minutes at room temperature.Next, samples were rinsed and incubated with 2% aqueous osmium tetroxide for 30 minutes at room temperature prior to the overnight incubation of 2% uranyl acetate at 4°C.The next day, samples were rinsed and dehydrated through a series of ethanol solutions at increasing concentrations (30%, 50%, 70%, 85%, 95%, 100%) for 10 minutes each, followed by two additional 10-minute incubations in 100% acetone.Tissue samples were then infiltrated with Embed812 resin (Electron Microscopy Sciences) by incubating them in 33% resin (diluted in anhydrous acetone) for 2 hours, 66% resin overnight, and 100% resin overnight.Samples were embedded in fresh resin using polypropylene molds (Electron Microscopy Sciences) and polymerized in an oven for 48 hours at 65°C.After polymerization, samples were removed from the molds, block faces were trimmed, and 500-nm sections were cut using a Leica UC6 ultramicrotome.For cell samples, the coverslips were inverted onto BEEM capsules that were filled with resin, followed by polymerization in an oven for 48 hrs at 60°C.To remove the coverslip, the resin blocks were immersed in liquid nitrogen, and the coverslip can be peeled off using tweezers, and the cells were left on the resin surface.The resin blocks were trimmed, and 500-nm-thickness sections were then cut as described above.
Scanning electron microscopy.Semithin 500-nm-thickness sections were mounted onto silicon wafers.Electron microscopy (EM) images were acquired using the backscattered electron detector using an FEI Verios scanning electron microscope (Thermo Fisher Scientific) with an acceleration voltage was 2 kV.For EMDiffuse-n, the dwell time of the electron dose on each point is changed to adjust the noise level of raw images.In the denoising task, each region is imaged 6 times with [0.5 µs, 1 µs, 1.5 µs, 2 µs, 4 µs, 36 µs] dwell time with 40,000× magnification with a pixel size of 3.3 nm.
Transfer learning datasets.In transfer learning, we focused on three different tissues: mouse liver, heart, and bone marrow, and cultured HeLa cell sample.The images of three tissue samples have a pixel size of 3.3 nm, and cultured HeLa cell images have a pixel size of 2.1 nm.The dwell time for raw images was set at 2 µs, while for ground truth images, it is 36 µs.These four datasets were imaged using the FEI Verios SEM with an acceleration voltage of 2 kV.The same registration pipeline was applied to transfer learning datasets.

EMDiffuse diffusion process Denoising Diffusion Probabilistic Models in image quality enhancement.
EMDiffuse's fundamental basis is the Denoising Diffusion Probabilistic Model (DDPM) 2 , which belongs to a class of deep generative models.It can generate realistic data samples by reversing a diffusion process.The diffusion process, also known as the forward diffusion process, converts the target data distribution (e.g., the real image distribution) into a tractable probability distribution (e.g., the Gaussian distribution) by gradually adding noise to the data until they become pure noise.DDPM then learns to generate realistic samples from pure random noise by eliminating noise at each step, thereby reversing the diffusion process.In the following section, we will elaborate further on the forward and reverse diffusion process.
Forward process.Given a sample  !(e.g., a ground-truth reference image) drawn from the target data distribution (), the forward diffusion process, which gradually adds random gaussian noise with  discrete steps, produces a sequence of increasingly noisy samples  " ,  # , … ,  $ sampled from ( ":$ |x ! ).

𝑞(𝐱
where  & denotes the noisy data at step , I is an identity matrix, and α & ∈ (0,1) is the noise schedule parameter that controls the variance of noise at each step .Specifically, the noise schedule is set so that  $ becomes pure random noise following a standard normal distribution.For this purpose, the noise schedule parameter  & initiates from a small value close to zero and grows progressively over time, reaching its maximum value at step .For this purpose, we adopt the linear schedule as: Reverse diffusion process.The reverse diffusion process uses a parametric model, i.e., a neural network, with parameters  to invert the diffusion process and generates high-quality samples  !from pure gaussian random noise  $ .In our setting, we include the raw input image  -(e.g., noisy input image) as a condition for the reverse diffusion process, which becomes a conditional process that can be expressed as below, where  / and  / are denoising functions realized by a deep neural network with parameters  to predict the mean and covariance matrix and  is the reverse step index.
Following DDPM 2 , we use a non-learnable fixed covariance matrix and adopt the reverse process parameterization of  / as: which uses a network  .to predict the noise.DDPM empirically found that a simplification of  012 ( !| -) yield better sample quality in practice.We thus adopt the simplified training objective: Where  ~ (0, 1).

UDiM network architecture
U-Net architecture.In our approach, we employed a U-Net 3 architecture for  .( & , ,  -) akin to the one implemented in DDPM (Supplementary Fig. 2).U-Net is particularly well-suited for image denoising and super-resolution tasks, owing to its ability to efficiently capture both local and global contextual information from input images.The U-Net architecture consists of two main components: the encoder (Contracting Path) and the decoder (Expanding Path).These two elements are interconnected through skip connections, which facilitate the propagation of highresolution spatial information from the encoder to the decoder.This feature enables U-Net to generate fine-grained details in the resulting image, which is crucial for tasks such as image denoising and super-resolution.
Global attention.Drawing inspiration from SR3 4 , we incorporated a global attention layer, akin to the self-attention mechanism in transformers, into the U-Net architecture.This enhancement boosted the model's performance by allowing it to capture long-range dependencies and contextual information across the entire input image.Specifically, the global attention layer is placed at the bottleneck of the U-Net i.e., between the encoder (contracting path) and the decoder (expanding path), to reduce the computation costs.
A global attention layer operates as follows.Firstly, we use three linear layers to transform the input feature map, which is derived from the contracting path.This transformation produced the Query ( ∈  C×E×F ), Key ( ∈  C×E×F ) and Value ( ∈  C×E×F ) embeddings, where  and  represent the spatial height and width of the feature map, and  represents the feature dimension.Next, we compute the attention as: where the "softmax" operation normalizes the attention weights, resulting in a summation of 1.The attention module refines the feature of a single pixel location by softly aggregating the information from all spatial locations based on their similarities.This process enables the features to effectively capture global information (Supplementary Table 1).

Denoising baseline implementation details
CARE 5 software was installed from GitHub (https://github.com/CSBDeep/CSBDeep).The configurations were consistent with the prior application of CARE to SEM 6 .The depth of the network was 2. The batch size was 16.The initial learning rate was 4e-4.The mean absolute error (MAE) was used as the loss function.
PSSR 7 algorithm was sourced from GitHub (https://github.com/BPHO-Salk/PSSR).Given our pre-acquisition of noisy images and our registration pipeline, the crapper intended for generating noisy EM images for training within PSSR was skipped.The training hyperparameters followed the PSSR manuscript and GitHub guidelines.The mean-squared error loss (MSE) was used as the loss function.The batch size was 64.The One Cycle Policy 8 was adopted as the learning rate schedule with the maximum learning rate set as 4e-4 and stochastic gradient descent with restarts (SGDR) as the optimizer 9 .
RCAN 10 was installed from the BasicSR package (https://github.com/XpixelGroup/BasicSR),a compendium of popular super-resolution algorithms.The upsample scale of the model was set as 1.The Adam optimizer 11 with an initial learning rate of 1e-4 was employed to train RCAN.The batch size was 64.The MSE was used as the loss function.The learning rate was halved whenever the validation loss plateaued for over 10 epochs.
Noise2Noise 12 was adopted from GitHub (https://github.com/joeylitalien/noise2noise-pytorch).Notice that our dataset contained images with varying noise within identical regions.This allowed us to use paired noisy images instead of manually adding different noise to images.The batch size was 64.The Adam optimizer with an initial learning rate of 1e-4 was employed to train the model.The batch size was 64.The MSE was used as the loss function.The learning rate schedule was the same as that of RCAN.
Noise2Void 13 was modified based on GitHub (https://github.com/juglab/n2v).The batch size was 64.The Adam optimizer with an initial learning rate of 4e-4 was employed to train the model.The MSE was employed as the loss function.The learning rate schedule was the same as that of RCAN.

Evaluation metrics
Noise level measurement.In the data acquisition process, the dwell time of the electron dose was adjusted to alter the noise level.However, using dwell time directly to represent noise level of raw images was not universally applicable across different microscopes and tissues.To quantitatively measure the noise level of each raw image, we employed a Blind/Reference-free Image Spatial Quality Evaluator (BRISQUE) 14 .This metric is designed to quantify an image's perceptual quality of an image by examining the naturalness or statistical regularities present in the image.It can effectively evaluate the quality of images subjected to distortions such as compression, noise, or blurring, without requiring an original, undistorted reference image.We used this metric to evaluate the noise level of raw images during inference to optimize the prediction generation method.
PSNR.Peak Signal-to-Noise Ratio (PSNR) is a metric for measuring the quality of generated images with respect to their original ground-truth reference images 5,15 .It is calculated by comparing the peak signal power (e.g., maximum image intensity) and noise power.PSNR is mostly measured by the mean square error.
Where  refers to the generated image and  refers to the ground truth image.m and n are the width and height of the image. G denotes the maximum pixel value of the ground-truth reference image  .PSNR quantifies the pixel-wise intensity mean square error between the generated image and the ground truth image.A higher PSNR means better image quality.However, it is susceptible to being influenced by noise in the reference image.In electron microscope imaging, it is not feasible to capture noise-free images as ground truth (Figure 1b, Figure 2b).Thus, the pixel-wise mean square error and PSNR scores (error maps in Supplementary Fig. 4) cannot necessarily account for high-quality results due to the impact of random noise.While baseline models, due to the L1 or L2 training loss, predict the average/median of all possible target data distributions for approximating minimum mean square error (MMSE) of solution distribution, which induces smoothness in the prediction and alleviates the influence of noise in reference ground-truth when calculating PSNR.Therefore, we use the Feature Similarity Index Measure (FSIM) 16 and Learned Perceptual Image Patch Similarity (LPIPS) 17 which emphasize structural quality and align well with human judgment.Details of FSIM and LPIPS are discussed as follows.
FSIM.FSIM 16 is a perceptual quality metric inspired by the human visual system.It assesses the similarity between images by examining their low-level features, such as phase congruency and gradient magnitude.It is calculated as follows: Where N and M are the height and width of the image respectively.(, )(, ) denotes the similarity measure between the phase congruency and gradient magnitude features of the pixels (i, j) in images x and y. () represents phase congruency calculated with the filter proposed ealier 18 .Phase congruency employs the phase of image Fourier transform to capture edges and other prominent structures in an image without being influenced by changes in lighting or contrast.() is the gradient magnitude calculated by the Sobel operator 19 .Gradient magnitude, on the other hand, reflects local variations in intensity or color within an image, offering insight into edges, textures, and intricate details. {,,I} (, ) denotes the perceptible significance of each pixel which is decided by the maximum value of phase congruency between two images.FSIM is specifically designed to capture the structural and textural information in images, which is a vital aspect of practical EM applications in biology.By concentrating on these features, FSIM delivers a more precise metric of structural similarity between images than PSNR 20,21 .
LPIPS.LPIPS 17 is a perceptual similarity metric that leverages deep neural networks.It is trained on an extensive dataset annotated by humans, which allows it to capture perceptual differences more precisely between images as perceived by human observers.LPIPS prioritizes the similarity of deep image features over pixel-wise differences, resulting in improved robustness and alignment with human perception.Consequently, LPIPS has become a widely adopted quality measurement metric in cutting-edge computer vision researches 22,23 .
Resolution Ratio.We compute the resolution of input, generated image, and GT using image decorrelation analysis 24 .The image decorrelation analysis cross-correlates the normalized Fourier transform with the original image in Fourier space using Pearson correlation.By repeating this calculation while applying a binary circular mask of radius [0, 1] on normalized Fourier transform, the decorrelation function is computed.
where () denotes the Fourier transform of the input image,  H (), normalized (), k = [kx, ky], Fourier space coordinates I(k), and M(k;r), the binary mask on frequency domain of radius r.The input image undergoes total Ng high-pass filters, ranging from mild to intense, to reduce lowfrequency energy.For each processed image, it calculates a decorrelation function and extracts the peak position labeled as ri (local maximum).
Then we computed the resolution ratio: =     (17)   Fourier Ring correlation.The Fourier ring correlation between predictions and ground truth images is plotted using the FIJI plugin (https://imagej.net/plugins/fourier-ring-correlation).The Fourier ring correlation plot reflects the correlation between two images in different spatial frequency components.-% ∈--% ∈- (18)   where  " and  # are the Fourier transform of generated image and ground truth image. " , , 4 refers to pixels on the perimeter of circles of constant spatial frequency with magnitude r.

Isotropic reconstruction baseline implementation details 3D-SRU-Net training:
To our knowledge, no official implementation of 3D-SRU-Net 25 exists.Thus, we implemented it using Pytorch.The code of our implementation has been uploaded to GitHub (https://github.com/Luchixiang/EMDiffuse/tree/master/3D-SR-Unet).The isotropic volume was cropped into 96 x 96 x 96 subvolumes for training.The initial number of convolution filters was 32.The depth of the model was 3, with each level hosting three convolution layers.Following the paper, the Adam optimizer was employed with a stepwise, square-root learning rate schedule.The initial learning rate was 1e-4.Cubic-weighted PSNR was used as the loss function for training.The network training was stopped when the validation loss didn't decrease for 20 epochs, and the checkpoint with the best performance on the validation set was selected for testing.For Openorganelle mouse Kidney isotropic reconstruction, the training and inference volumes were both the downsampled 8 nm x 8 nm x 48 nm kidney volume, the same as vEMDiffuse-a.The official anisotropic transform function was employed to downgrade XY view images for training.Unlike its common application on fluorescence microscopy images, the PSF kernel was fixed as 1 due to the small wavelength of electrons.The subsample rate was 6 to meet the 48 nm axial resolution.18,000 patches were cropped with size 128 x 128 from the anisotropic training volume for training and 2,000 patches were cropped for validation.The depth of the network was 2. The initial learning rate was 4e-4.The MAE was used as the loss function.

Organelle segmentation implementation details
A 3D U-Net model 26 served as the segmentation expert.We trained two separate models for mitochondria and ER segmentation using the bcedice loss function: Where A represents the predicted segmentation mask of the model, and B denotes the GT mask.∩ symbolizes the intersection of two masks, while ∪ represents the union.The segmentation model was trained using the Adam optimizer with an initial learning rate of 2e-4.We used small

SupplementaryFig. 3 . 5
EMDiffuse samples one plausible solution at each test time, and the prediction can be optimized using the means of different outputs.a, b.EMDiffuse samples one plausible solution from distribution.In cases where the intricate structure is heavily dominated by noise (a), variance exists in sampled images (arrows).For less noisy cases (b), outputs are highly consistent.Multiple outputs can be sampled, analyzed individually, or averaged to improve the final prediction.Scale bar, 0.1 μm.c.Assessments of EMDiffuse performance based on FSIM.Shown are the FSIM of EMDiffuseprocessed images of different noise levels and the mean of different numbers of outputs.The number of outputs for generating the mean result that yields the highest FSIM is selected for EMDiffuse prediction.d.Quantification of resolution ratio (resolution of GT/resolution of EMDiffuse prediction) from the mean of different numbers of EMDiffuse outputs and different input noise levels.Scale bar, 0.1 μm.Source data are provided as a Source Data file. is impacted by random noise in ground-truth image.a. PSNR measures the pixel-wise intensity similarity.The first row is the low noise level input and GT.The second row is the EMDiffuse prediction with the mean of six outputs (6mean) and without mean (single).Error maps (third row) and blend images of predictions and error maps (fourth row) indicate that the PSNR score is significantly affected by noise and does not correlate with structural similarity.Scale bar, 0.1 μm.b.In the denoising task, the PSNR value rises with the increasing number of outputs used to calculate the mean prediction.The average of several outputs converges the minimum mean square error of solution distribution and, therefore, alleviates the impact of noise and raises the PSNR score.c.Comparison of EMDiffuse with other methods in the denoising task.EMDiffuse outperforms all other methods in terms of PSNR with a mean calculation of six outputs, but the results are oversmoothed and the resolution is compromised.Source data are provided as a Source Data file.Additional examples of EMDiffuse-n for denoising EM images and comparison with other denoising methods.Additional examples of denoising of mouse cortex EM images compared to other denoising methods.Left: a denoising example containing both the noisy input and the EMDiffuse-n prediction.Right: two enlarged regions of raw input, ground truth (GT), and predictions from EMDiffuse, CARE, RCAN, PSSR, Noise2Noise (N2N), and Noise2Void (N2V) denoising methods.Resolution values are shown in the top left corner and Fourier power spectrums are in each panel's bottom left corner, showing that EMDiffuse generates images with the resolution similar to the GT image and effectively reduces noise without undesirable over-smoothness.The uncertainty value is displayed in the bottom right corner.U, uncertainty value.Scale bar, left, 0.3 μm; right, 0.1 μm.
-frozen transfer learning for EMDiffuse enables fine-tuning with one training pair.a. Model architecture for EMDiffuse transfer learning.The encoder part of the model is frozen to prevent over-fitting and reduce the requirement for training data.The bottleneck with attention mechanisms is not frozen.b.FSIM results of EMDiffuse denoising performance using different numbers of transfer learning training images.The proposed half-frozen training approach achieved similar performance using only one training image (3 megapixels) across EM images from three different mouse tissues and one cultured HeLa cell sample, overcoming the limitations of a small fine-tuning dataset (n=10).

Supplementary Fig. 9
EMDiffuse outperforms other super-resolution methods across various noise levels in structural and perceptual similarities.a. FSIM metrics of EMDiffuse super-resolution from different noise levels and different mean numbers on the super-resolution dataset.EMDiffuse adaptively selects the appropriate number of images for dynamic mean based on the noise level of the input image to obtain the highest FSIM score.b.Additional example of EMDiffuse super-resolution result and the corresponding input image.c.Representative images of one enlarged region of b with GT, different noise levels input and super-resolution predictions from EMDiffuse, CARE, RCAN, and PSSR.Resolution values and Fourier power spectrum are in each panel's top left corner.The uncertainty values are indicated in EMDiffuse prediction results.U, uncertainty value.Scale bar, (b), 0.3 μm; (c), 0.1 μm.
Supplementary Fig. 12 Additional example of the performance of vEMDiffuse-i for generating XY views with isotropic resolution of the Openorganelle liver dataset.Another series of input and vEMDiffuse-i predicted XY views (from 8 nm x 8 nm x 48 nm resolution anisotropic volume) along the z-axis of the Openorganelle liver dataset (jrc_mus-liver).The first column presents the layers of the input (depicted in green) and the vEMDiffuse-i predictions with the uncertainty values (depicted in yellow).The second and third columns showcase layers of the isotropic volume (GT) and vEMDiffuse-i predicted isotropic volume of an enlarged region of the first column.U, uncertainty value.Scale bar, first column, 1 μm, second and third columns, 0.3 μm.
vEMDiffuse-i generates isotropic vEM volumes and captures the continuity of ultrastructure along the z-axis in the Openorganelle kidney dataset.Shown are a series of input and vEMDiffuse-i predicted XY views (from 8 nm x 8 nm x 48 nm resolution anisotropic volume) with uncertainty values along the z-axis with isotropic resolution of the Openorganelle Kidney Dataset (jrc_mus-kidney).The figure is organized identically to Supplementary Fig.10band Supplementary Fig.11.vEMDiffuse-i's prediction (depicted in yellow) captures the gradual changes of the ultrastructure of the mitochondria and mitochondria cristae as shown in the ground truth.U, uncertainty value.Scale bar, first column, 400 nm; second and third columns, 40 nm.
Supplementary Fig.14 vEMDiffuse-ireconstructs anisotropic volumes into isotropic volumes of two vEM datasets.a. XZ views of the ultrastructure of a mouse liver.Shown are two XZ view examples and their enlarged regions of the anisotropic volume (8 nm x 8 nm x 48 nm resolution), interpolated volume, vEMDiffuse-i generated volume and isotropic volume (GT, 8 nm x 8 nm x 8 nm resolution) from the Openorganelle liver dataset (jrc_mus-liver).b.XZ views of the ultrastructure of a mouse kidney.Shown are two XZ view examples and their enlarged regions of the anisotropic volume (8 nm x 8 nm x 48 nm resolution), interpolated volume, vEMDiffuse-i generated volume, and isotropic volume (GT, 8 nm x 8 nm x 8 nm resolution) from the Openorganelle kidney dataset (jrc_mus-kidney).Scale bar, 1 μm.vEMDiffuse-i outperforms other isotropic reconstruction methods.a, b.Example XZ view and YZ view and enlarged regions of the anisotropic volume (8 nm x 8 nm x 48 nm resolution), interpolated volume (ITK-cubic), 3D-SRU-Net generated volume, vEMDiffuse-i generated volume, and ground truth isotropic volume (GT, 8 nm x 8 nm x 8 nm resolution) on the Openorganelle mouse kidney dataset (jrc_mus-kidney).Scale bar, 1 μm.c.The violin plots of three metrics (LPIPS, FSIM, and resolution ratio) show the quantitative performance assessment of vEMDiffuse-i for the isotropic reconstruction task compared with ITKcubic interpolation and 3D-SRU-Net (n=1000).Source data are provided as a Source Data file.vEMDiffuse-i generates isotropic vEM volumes and captures the continuity of ultrastructure along the z-axis in the T-cell vEM datasets.Shown are a series of input and vEMDiffuse-i predicted XY views (from 8 nm x 8 nm x 48 nm resolution anisotropic T-cell volume) with uncertainty values along the z-axis with isotropic resolution of the Openorganelle T-cell Dataset (jrc_ctl-id8-2).The figure is organized identically to Supplementary Fig. 10b, Supplementary Fig. 11 and Supplementary Fig. 12. vEMDiffuse-i's prediction (depicted in yellow) captures the gradual changes of the ultrastructure of the organelles as shown in the ground truth.U, uncertainty value.Scale bar, first column, 1 μm; second and third columns, 0.3 μm.
17 vEMDiffuse-i reconstructs anisotropic volumes into isotropic volumes of T-cell vEM datasets.a. YZ view of the ultrastructure of a T-cell.Shown are one YZ view example and one enlarged region of the anisotropic volume (8 nm x 8 nm x 48 nm resolution), interpolated volume, vEMDiffuse-i generated volume and isotropic volume (GT, 8 nm x 8 nm x 8 nm resolution) from the Openorganelle T-cell dataset (jrc_ctl-id8-2).b.XZ view of the ultrastructure of a T-cell.Shown are one XZ view example and their one regions of the anisotropic volume, interpolated volume, vEMDiffuse-i generated volume, and isotropic volume (GT) from the Openorganelle T-cell dataset (jrc_ctl-id8-2).Scale bar, 1 μm.18 vEMDiffuse-i is transferable for the brain vEM dataset and captures the of neuron ultrastructure.a. XY view of vEMDiffuse-i generated anisotropic brain volume (5 nm x 5 nm x 30 nm resolution) with uncertainty values and isotropic brain volume from the EPFL brain dataset (5 nm x 5 nm x 5 nm resolution).The first column presents the layers of the input (depicted in green) and the vEMDiffuse-i predictions with the uncertainty values (depicted in yellow).The second column displays the same layers in ground truth isotropic volume.vEMDiffuse-i captures the gradual appearance of vesicles and discriminates proximal membrane structures in neurons along the zaxis.b and c.YZ view (b) and XZ view (c) of anisotropic brain volume, vEMDiffuse-i generated isotropic volume, and ground truth isotropic volume (GT).U, uncertainty value.Scale bar, (a), 0.2 μm; (b), 0.1 μm.19 3D reconstruction pipeline and the generation of organelle masks on vEMDiffuse-i reconstructed EM volume and isotropic EM volume.a. Schematic of the 3D reconstruction pipeline of isotropic vEM datasets generated by vEMDiffuse-i.A segmentation deep learning model is incorporated to segment organelles of interest for the 3D reconstruction of vEM datasets.b.Examples of segmentation results.Two representative XY view composite images of vEMDiffuse-i generated images and corresponding GT images with the organelle segmentation masks generated with the segmentation network.Blue mask, mitochondria; yellow mask, ER.Scale bar, 1 μm.Supplementary Fig. 20 Reliability of vEMDiffuse-i generated volumes from anisotropic volumes with different axial resolutions.a. Example YZ view of the isotropic volume (8 nm x 8 nm x 8 nm resolution), vEMDiffuse-i generated volume from 8 nm x 8 nm x 48 nm resolution 8 nm x 8 nm x 56 nm resolution, 8 nm x 8 nm x 64 nm, 8 nm x 8 nm x 72 nm, from 8 nm x 8 nm x 96 nm resolution anisotropic volumes downsampled from the Openorganelle mouse liver dataset (jrc_mus-liver).The Intersection over Union (IoU) scores of ER and mitochondria segmentation results are shown in the bottom right corner.Scale bar, 0.3 μm.b.The violin plots of FSIM metrics of five generated volumes (n=1000).c.The violin plots of uncertainty values of five generated volumes (n=1000).Source data are provided as a Source Data file.
vEMDiffuse-a generates isotropic vEM volumes and captures continuity of ultrastructure along the z-axis with only XZ views as training Data.a. Example training images.A series of XZ views with uncertainty values along the y-axis of the downsampled Openorganelle kidney dataset (by removing several z slices, 8 nm x 8 nm x 48 nm resolution) is used to train vEMDiffuse-a.b.Example inference images.A series of XY views along the z-axis generated by vEMDiffuse-a.The first column presents the layers of the input (depicted in green) and the vEMDiffuse-a predictions (depicted in yellow).The second and third columns showcase layers of the isotropic volume (GT) and vEMDiffuse-a predicted isotropic volume of an enlarged region of the first column.vEMDiffuse-a captures the gradual changes in the ultrastructure of the endoplasmic reticulum (ER) and mitochondria, as shown in the ground truth slices.U, uncertainty value.Scale bar, first column, 400 nm; second and third columns, 40 nm.
vEMDiffuse-a achieves comparable performance in generating isotropic volumes of vEM datasets with vEMDiffuse-i.Example XZ view and YZ view and enlarged regions of the anisotropic volume (8 nm x 8 nm x 48 nm resolution), vEMDiffuse-i generated volume, vEMDiffuse-a generated volume, and ground truth isotropic volume (GT, 8 nm x 8 nm x 8 nm resolution) on the Openorganelle kidney dataset (a) and Openorganelle liver dataset (b).Scale bar, (a), 1 μm; (b), 0.6 μm.
vEMDiffuse-a outperforms other unsupervised isotropic reconstruction methods.a, b.Example XZ view and YZ view and enlarged regions of the anisotropic volume (8 nm x 8 nm x 48 nm resolution), interpolated volume (ITK-cubic), CARE generated volume, vEMDiffuse-a generated volume, and ground truth isotropic volume (GT, 8 nm x 8 nm x 8 nm resolution) on the Openorganelle mouse kidney dataset (jrc_mus-kidney).Scale bar, 1 μm.c.The violin plots of three metrics (LPIPS, FSIM, and resolution ratio) show the quantitative performance assessment of vEMDiffuse-a for the isotropic reconstruction task compared with ITK-cubic interpolation and CARE (n=1000).Source data are provided as a Source Data file.

Supplementary Fig. 10 EMDiffuse-r can be easily adapted to cultured HeLa cell images.
Examples of EMDiffuse-r transfer learning from mouse brain cortex dataset to cultured HeLa cell sample with a pixel size of 2.1 nm.The model is fine-tuned with one well-aligned pair (3 megapixels) of downsampled noisy and high-quality cultured HeLa cell images.U, uncertainty value.Scale bar 0.15 μm.